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ABSTRACT 

Recently sparse coding have been highly successful in im¬ 
age classification mainly due to its capability of incorporating 
the sparsity of image representation. In this paper, we pro¬ 
pose an improved sparse coding model based on linear spatial 
pyramid matching(5'PM) and Scale Invariant Feature Trans¬ 
form {SIFT ) descriptors. The novelty is the simultaneous 
non-convex and non-negative characters added to the sparse 
coding model. Our numerical experiments show that the im¬ 
proved approach using non-convex and non-negative sparse 
coding is superior than the original ScSPM^ on several typ¬ 
ical databases. 

Index Terms — Image classification, Non-convex and 
non-negative sparse coding , SPM, Iterative support detection 

1. INTRODUCTION 

In recent years, image classification, as an important part of 
image processing problems, has been a research focus in com¬ 
puter vision. Support Vector Machines (SVMs) using Spa¬ 
tial Pyramid Matching (SPM) have been highly successful 
in image classification. SPM model, as a feature extraction 
method, is extended from bag-of-words(BoW) model, which 
is a document representation method in the field of informa¬ 
tion retrieval. This model treats documents as some keywords 
combinations, ignores the syntax and sequence of the texts, 
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finally matches the documents with the frequencies of key¬ 
words. Researchers have applied this method onto image pro¬ 
cessing and turned up the bag-of-features(BoF), which repre¬ 
sents an image as a histogram of its local features. 

Unfortunately, the method is incapable of capturing 
shapes or locating an object, because of this attribute discard¬ 
ing the spatial order of local features. In order to overcome 
this issue, spatial pyramid matching (SPM) is proposed, as 
the most successful extension of the BoF model by incorpo¬ 
rating the geometric correspondence search, discriminative 
codebook learning and the generative part model. The SPM 
method partitions an image into 2* x 2^ segments in different 
scales I = 0,1, 2, and computes the BoF histogram to form 
a vector representation of the image. In cases where only the 
scale I = 0, SPM reduces to BoF. Due to the utility of the 
spatial information, SPM method is more efficient than BoF 
and also has shown very promising performance on many 
image classification issues. 

Figure [His a typical flowchart that clearly illustrates the 
SPM approach. The (a) part is the flow chart of the linear 
SPM methods. The flow contains SIFT extraction, sparse 
coding, spatial pooling and classification. The (b) part is our 
flowchart that is based on sparse coding model. Firstly, in 
the descriptor layer, feature points like SIFT descriptors can 
be extracted from the input image. Then a codebook is ap¬ 
plied to quantize each descriptor and obtain the code layer. 
In the next SPM layer, multiple codes from inside each sub- 
region are pooled together by averaging and normalizing into 
a histogram. Finally, the histograms from all sub-regions are 
concatenated together to generate the final representation of 
the image for the following classification task. 
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sparse coding followed multi-scale max pooling has no con¬ 
straints on the sign of coding coefficients. In order to avoid 
the loss of some information loss in the following max pool¬ 
ing, the non-zero components are conditioned to non-negative 
in our sparse coding model, as did in l|3- Considering that the 
sparsity of images representations is also essential|[T]|2l|6l, we 
employ non-convex algorithm-ISD||5l, in order to get more 
sparse representations. In sum, we propose non-convex and 
non-negative sparse coding and we call the improved ScSPM 
as NNScSPM. 

The remainder of the paper is organized as follows. Sec¬ 
tion 2 introduces the basic idea of of ScSPM and describes 
our proposed NNScSPM. Section 3 presents experiment re¬ 
sults. Finally, section 4 concludes our paper. 

2. LINEAR SPM USING SIFT NON-CONVEX AND 
NON-NEGATIVE SPARSE CODES 

There are two basic feature extraction methods in image clas¬ 
sification: vector quantization and sparse coding. In this sec¬ 
tion, we introduce these two models firstly, and then we pro¬ 
pose our model and algorithm. 


(a) Linear ScSPM 


(b) Linear NNScSPM 


2.1. Vector Quantization (VQ) 


Fig. I. Schematic comparison of the original ScSPM with our 
proposed NNScSPM, which is an improved algorithm based 
on the former. 


It has been empirically found that, traditional SPM has to 
use classifiers with a particular type of nonlinear Mercer ker¬ 
nels, e.g. the intersection kernel or the Chi-square kernel for 
achieve good performance. Accordingly, the cost of nonlin¬ 
ear classifiers, bearing 0{n?) , computational complexity in 
training and 0(n) for testing in SVM(n is the number of sup¬ 
port vectors), are very expensive. So it’s impractical for large 
scale real-world applications. 

To settle this issue, Yang et aim proposed a method 
called linear spatial pyramid matching using sparse cod- 
ing(ScSPM), which achieves state-of-the-art performance on 
several databases in image categorization experiments. In 
the following years, some improved methods based on Sc¬ 
SPM are introduced, like LLC^ and LR-ScSPMW\~ which 
mainly take into account the features’s locality in the feature 
quantization process based on sparse coding to improve the 
accuracy of image classification. 

In this paper, we propose an improved image classifica¬ 
tion algorithm based on ScSPM, which can apply non-convex 
and non-negative properties admirably to image sparse repre¬ 
sentation to improve the accuracy of image classification. It 
is well known that there are two fundamental steps in image 
representation. One is coding, where sparse coding is widely 
used now, and the other is spatial pooling, which can mainly 
get the feature representation of images. However, traditional 


Let A be a set of SIFT appearance descriptors in a L dimen¬ 
sional feature space, i.e. X = [xi, a; 2 ,..., S 

The vector quantization(VQ) method applies the K-means 
algorithm ifTTll to solve the following problem: 


M 

min > min I \xm 

D ^ 1=1...p 
m—1 


di\? 
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where D = [di,..., are the p cluster centers to be found, 
called codebook or dictionary, and || • || denotes the ^ 2 -norm 
of vectors. The optimization problem can be rewritten into a 
matrix factorization problem with cluster membership indica¬ 
tors A = aM]^; 

M 

minA,D E \\t^m 11 ( 2 ) 

m—1 

s.t. Card{am) = 1, \o.m\ = IjCim > 0 ,Vto 

where Card{am) = 1 is a cardinality constraint, meaning 
that only one element of am is nonzero; am > 0, means 
that all the elements of am are nonnegative, and \am \ is the 
^ 1 —norm of am, i e., the summation of the absolute value of 
each element in am- After the optimization problem (|2l) is 
solved, the index of the only nonzero element in am indicates 
which cluster the vector Xm belongs to. 

The objective of Eq.(2) is non-convex and the minimiza¬ 
tion is always alternatively performed with respect to the la¬ 
bels A with D fixed, and with respect to D with the labels 
fixed IfTTll . This phase is called training. After that, the coding 








































phase will be going. The learned D will be applied or tested 
on a new set of X and in such cases, the problem (|2l) will be 
solved with respect to A only. 

2.2. Sparse Coding 

Because of restrictive the constraint Card{am) = 1 , which 
makes the vector Xm responding to only a element of the 
codebook, the reconstruction of X may be too coarse. To 
overcome this issue, Kai Yu et al. im relaxed the constraint by 
instead putting an Zi—norm regularization on am, which en¬ 
forces am to have a small number of nonzero elements. Then 
the VQ formulation is turned into another problem known as 
sparse coding (SC): 

M 

min ^ -f A|Q:y 77 ,| (3) 

A D 

ra—\ 

Where A is a trade-off parameter for balancing the fidelity 
term and the sparse regularization term. In order to avoid 
trivial solutions, a unit Z 2 —norm constraint on am is typically 
applied. Normally, the dictionary D is an overcomplete basis 
set, i.e. P >- L. Solving Q is similar to that of the problem 
(|2|i consisting of a training phase and coding phase. 

2.3. Our model and algorithm for Sparse Coding 

2.3.1. Our Model: Non-convex and Non-negative Sparse 
Coding 

Because of less restrictive constraint, SC coding can achieve 
a much lower reconstruction error than VQ coding. Research 
in image statistics has clearly disclosed that natural image 
patches are sparse signals, and the sparse representation can 
help capture salient properties of images. 

The SC model is based on Zi—norm, which is a popu¬ 
lar sparsity enforcement regularization due to its convexity. 
However, the non-convex sparse regularization such as the 
widely used Zp—norm (0 < p < 1), prefers an even more 
sparse solution. Although there have existed many works 
based on non-convex penalization for image processing, to 
our best knowledge, there have existed only few specific 
works for image classification. The major difficulties with 
the existing non-convex algorithms are that the global optimal 
solution cannot be efficiently computed for now, the behavior 
of a local solution is also hard to analyze and more seriously 
the prior structural information of the solution is hard to be 
incorporated. 

SC model in IH followed by max pooling did not use non¬ 
negative constraint. It’s well known that in order to satisfy the 
optimization, negative coefficients are usually needed. Be¬ 
cause non-zero components typically reflect remarkable fea¬ 
ture information, max pooling will bring the loss in terms of 
these negative components, and moreover reduce the accurate 
of image classificationl^. 


Taking those two situation account, this paper proposes 
a non-convex and non-negative sparse model based on the 
convex sparse coding and proposes a multistage convex re¬ 
laxation algorithm for it via our proposed iterative support 
detection Is], which has proved to be more sparse than the 
pure Zi solution in theory in many cases. Our new sparse 
model is a truncated f 1 model with nonnegative constraints as 
follows: 

M p 

miuA.D E (ll^m -^0^771,11 X'^^Wmjcl^mk l)(4) 

m—1 k—1 

s.t. am > 0,Vm 

where Wm^ are the 0 — 1 weights, i.e. the components of a 
corresponding to 0 weights are removed out of the penalty, 
and will not be penalized 0. 

While ones commonly consider the plain £1 model: 
mina |a; - Da\f -f AX]fc=i |amj> i-e-, all the weights Wm^ 
are the same and equal to 1, in practice, we might be able to 
obtain more prior information beyond the sparsity assump¬ 
tion. For example, we might know the locations of certain 
coefficients am^ of large nonzero absolute values. In such 
cases, we should not penalize these coefficients in the Zi 
norm regularization and remove them out of the £1 norm 
(corresponding w weights being 0) and use a truncated £1 
norm 0. 


2.3.2. Our Algorithm: ISD-YAlll algorithm 

The difficulty is that this kind of partial support information 
is not available beforehand in practice. Correspondingly, in 
this paper, we propose to take advantage of the idea of the 
Iterative Support Detection (ISD, for short) in 0 to extract 
the reliable information about the true solution and set up 0 — 
1 weights correspondingly. The procedure is an alternating 
optimization, which repeatedly take the following two steps: 

Step 1: Optimize a with Wm^ fixed (initially 1): this is a 
convex problem in a. 

Step 2: Determine the value of Wmk according to the cur¬ 
rent a. The value of weights will be used in Step 1 of the next 
iteration. 

For Step 1, the truncated nonnative £1 —norm optimization 
problems can be efficiently solved via the Yalll algorithms 
nni. For Step 2, the locations of large nonzeros are estimated 
from the solution of the last (truncated) £i-norm optimization 
problem via the support detection procedure. Our algorithm 
is named as ISD-Yalll, and described as follows: 


Algorithm The Proposed ISD-Yalll Algorithm 
Given X extracted an image and the codebook D. 
l.Set the iteration itr ^ 0 and initialize 
the set of detected entries ^ 0. 

2. while the stopping condition is not satisfied, 

(a) w**’’ ^ (/(**’■))C .= o \ 

(b) solve truncated li minimization for a = 

(Step 2; using Yalll method) 

(c) ^ support detection using as the reference; 
(Step 1: using threshold-lSD strategy) 

(d) itr + 1. 

Here denotes the universal set of (i, j, € 

{1,2,..., L}, I G X. The support detection, i.e. the set of the 
nonzeros of large magnitudes, are estimated as follows 0. 

I(^tr+l) :={(^,J' (5) 

itr = 0,1, 2,... 

In pooling phase, because of our non-negative model, the 
pooling function is defined as Zj =max{Qfij, Q; 2 j,..., asj}^ 
where Zj is the j-th element of z, which is used in linear SVM 
classifier for image classification, is the matrix element at 
i-th row and j-th column of A, and S is the number of local 
descriptors in the region. 

3. EXPERIMENT RESULTS 

In this section, we report the comparisons results between our 
proposed non-convex and non-negative model and ScSPM on 
two widely used public datasets: 15 scenes lfT9l and UIUC- 
Sport dataset ll20l . Experiments’s parameters setting will be 
analyzed in this section. Besides our own implementations, 
we also quote some results directly from the literature, espe¬ 
cially those of ScSPM from IT] and 0. All the experiments 
were performed under Windows 7 and Matlab (R2013b) run¬ 
ning on a desktop with an Inter(R)CPU i5-4590(3.3GHZ) and 
8GB of memory. 


uses spatial-pyramid histograms and Chi-square kernel slH. 
NScSPM is the non-negative sparse model which only use 
the non-negative constraint, not use the non-convex. 

3.2. 15 Scene Data Set 

Scene 15 contains 15 categories and 4485 images in all, with 
200 to 400 images per category. The image content is diverse, 
containing not only indoor scene, such as bedroom, kitchen, 
but also outdoor scene, such as buildings and country etc. To 
compare with others work, we randomly select 100 images 
per class as training data and use the rest as test data. The 
detailed comparison results are shown in the Table 1. 

Table 1. Performance Comparison on 15 Scene Dataset 


Method 

Average Classification Rate(%) 

KSPM 

76.73±0.65 

ScSPMlD 

80.28±0.93 

NScSPM 

81.30±0.53 

NNScSPM 

81.92±0.42 


3.3. UlUC-Sport Data Set 

UlUC-Sport contains 8 categories and 1792 images in all, and 
the image number ranges from 137 to 250. These 8 categories 
are badminton, bocce, croquet, polo, rock climbing,rowing, 
sailing and snow boarding. We also randomly select 70 im¬ 
ages from each class as training data and use the rest as test 
data. We list our results in Table 2. 


Table 2. Results on UlUC-Sport Dataset 


Method 

Average Classification Rate(%) 

ScSPM 

82.85±0.62 

NScSPM 

83.53±0.72 

NNScSPM 

84.13±0.37 


3.1. Parameters Setting 

Local features descriptor is essential to image representation. 
In our work, we also adopt the widely used SIFT feature 
due to its excellent performance in image classification. To 
fairly compare with others, we use 50,000 SIFT descriptors 
extracted from random patches to train the codebook which 
is same as III in the train phase. In sparse coding phase, the 
most important two parameters are (i): the sparsity regular¬ 
ization parameter A. The performance is best in ScSPMfTl 
when it is 0.2-0.4. We follow the same setting of the in¬ 
terval (0.2,0.4). (ii): the threshold value, which is used for 
computing the weights of coefficients, we take the threshold 
as followingie^®*’’) = max(a®‘’')//?®*’’+^, where the perfor¬ 
mance is good when /3 is 1.3 — 1.5 empirically. In our exper¬ 
iments, we compare our results to ScSPM and KSPM, which 


From the results, we have observed two points that: (i) 
non-convex and non-negative properties can play an impor¬ 
tant role on image classification indeed and (ii) our proposed 
NNScSPM is superior than KSPM and ScSPM on 15 scene 
dataset and UlUC-Sport dataset. 

4. CONCLUDES 

We propose a non-convex and non-negative sparse coding 
model for image classification in this paper, which is effi¬ 
ciently solved by proposed ISD-Yalll algorithm. The non- 
convex property reflects the sparsity of the image and the non¬ 
negative property can avoid the loss in max spatial pooling. 
Our numerical experiments effectively demonstrates its better 
performance. 
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